NSF PAR Search | NSF Public Access Repository

Exploring CLIP for Real World, Text-based Image Retrieval

https://doi.org/10.1109/aipr60534.2023.10440710

Sultan, Manal; Jacobs, Lia; Stylianou, Abby; Pless, Robert (September 2023, IEEE)

Abstract—We consider the ability of CLIP features to support text-driven image retrieval. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, we explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. We explore the effectiveness of textdriven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. We find that there is a sweet-spot of detail in the text that gives best results and find that words describing the “tone” of a scene (such as messy, dingy) are quite important in maximizing text-image similarity.

Full Text Available

Search for: All records